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I.  INTRODUCTION 


Multivariate  analysis  can  be  thought  of  as  a  methodology 
for  detection,  description  and  validation  of  structure  in  p- 
dlmensional  (p>l)  point  clouds.  Classical  multivariate  analy¬ 
sis  relies  on  the  assumption  that  the  observations  forming  the 
point  cloud(s)  have  a  Gaussian  distribution.  All  information 
about  structure  is  then  contained  in  the  means  and  covariance 
matrices,  and  the  well-known  apparatus  for  estimation  and  in¬ 
ference  in  parametric  families  can  be  brought  to  bear.  The  un¬ 
comfortable  ingredient  in  this  approach  is  the  Gaussianity  as¬ 
sumption.  The  data  may  be  Gaussian  with  occasional  outliers 
or  even  the  bulk  of  the  data  simply  might  not  conform  to  a 
Gaussian  distribution.  The  first  case  is  the  subject  of  robust 
statistics  and  is  not  treated  here.  We  discuss  methods  that 
do  not  Involve  any  distributional  assumptions.  In  this  case. 


*Work  supported  by  the  Department  of  Energy  under  contract 
DE-AC03- 76SF00515 

(Presented  at  Army  Research  Office  Conference  on  Modern  Oata 
Analysis,  North  Carolina,  June  1980) 
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structure  cannot  ba  psrcslvsd  by  looking  at  a  set  of  estimated 
parameters.  An  obvious  reaedy  Is  to  look  at  the  data  then- 
selves,  at  the  p-dlnenslonal  point  cloud(s),  and  to  bass  the 
description  of  structure  on  those  views.  As  perception  In 
aore  than  three  dlnensions  Is  difficult,  the  dimensionality  of 
the  data  first  has  to  be  reduced,  aost  slnply  by  projection. 
Projection  of  the  data  generally  laplies  loss  of  Information. 

As  a  consequence,  aultivariats  structure  does  not  usually  show 
up  In  all  projections,  and  no  single  projection  alght  contain 
all  the  information.  These  points  are  further  illustrated  in 
Chapter  2.  It  is  therefore  iaportant  to  judiciously  choose 
the  set  of  projections  on  which  the  aodel  of  the  structure  is 
to  be  based.  This  is  the  goal  of  projection  pursuit  proce¬ 
dures.  A  paradigm  for  aultlvarlate  analysis  based  on  these 
ideas  la  presented  in  Chapter  3. 

By  design,  projection  pursuit  methods  are  ideally  suited 
for  implementation  or  interactive  computer  graphics  systems. 

The  potential  of  interaction  between  user  and  algorithm  was 
convincingly  demonstrated  in  the  PRIM-9  system  for  detection 
of  hypersurfaces  and  clustering  (see  Flsherkeller  et  al  [1974]); 
this  system  is  discussed  in  Chapter  4.  Procedures  for  multiple 
regression  and  aultivariats  density  estimation  based  on  pro¬ 
jection  pursuit  are  outlined  in  Chapters  5  and  6.  Common  pro¬ 
perties  of  all  projection  pursuit  procedures  are  discussed  in 
Chapter  7. 

2.  DETECTION  AND  DESCRIPTION  OF 
STRUCTURE  WITH  PROJECTIONS 

Our  goal  is  to  detect  and  describe  aultivariats  structure 
using  projections  of  the  data.  Rowsvsr,  structure,  if  present. 


-  3  - 


Fig.  1  Structured  point  cloud  In  two  dimension* 

may  not  be  apparent  in  all  projections.  .This  is  Illustrated 
by  the  following  examples.  Figure  1  shows  a  point  sample 
drawn  from  a  bivariate  distribution.  The  apparent  structure 
the  point  cloud  (separation  into  two  clusters)  would  be  re¬ 
vealed  by  projection  onto  the  subspace  spanned  by  the  vector 
(1>  ~1)»  whereas  no  structure  would  be  apparent  in  a  projec¬ 
tion  on  the  subspace  spanned  by  the  vector  (1,  1). 

The  data  for  Figure  2  are  generated  from  the  regression 
model  Y  ■  +  Xj  +  6  with  (X^,  Xj)  uniformly  distributed  In 

[“1*11  *  1-1*11  and  €~N (0 , 0 . 01) .  Figure  2a  shows  a  projec¬ 

tion  on  the  two-dimensional  subspace  spanned  by  Y  and  the 
linear  combination  Z  -  Xj^  +  X^.  This  projection  clearly 
shows  the  association  between  the  predictors  Xj^  and  Xj  and 
the  response  Y.  A  similar  plot  with  Z  -  Xx  -  X2,  Figure  2b, 
is  clearly  less  structured. 


Fig.  2.  (a)  Projection  of  date  froa  model  Y-Xj+Xj+6  on 

plane  spanned  by  T  and  Z-Xj+Xj.  (Y  la  plotted  on  the  vertical 
axle),  (b)  Projection  of  date  froa  nodal  Y-Xj+Xj+C  on  plane 
apanned  by  Y  end  Z*X,-X,. 
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These  examples  show  that  It  is  Important  to  search  for 
structured  projections.  This  process  Is  called  projection 
pursuit. 

It  is  assy  to  anvislon  situations  where  not  all  the  infor- 
■stion  about  the  structure  is  contained  in  a  single  projection. 
Consider  the  regression  exaaple  above  but  with  Y  -  .  x2  +  £. 

Figures  3a  and  3b  show  two  projections  with  Zg  -  X^  -  Xj  and 
Z^  *  Xj  +  Xj-  To  understand  the  pictures,  note  that  the  sim¬ 
ple  coordinate  trans f oraat ion  Zg  ■  X1  +  X2>  *  X1  -  Xj 

2  2 

allows  one  to  express  the  response  as  Y  ■  .25  (Z#  -  Z^ ) .  It 
is  also  interesting  to  notice  that  the  quadratic  dependence 
on  Zg  is  washed  out  due  to  variability  caused  by  the  depend¬ 
ence  on  Zb>  and  vice  versa.  This  suggests  that  once  a  struc¬ 
tured  projection  has  been  found,  the  structure  should  be  re¬ 
moved  so  that  one  obtains  a  clearer  view  of  what  has  not  yet 
been  uncovered. 


S 

4 


Pig.  3.  (a)  Projaction  of  data  fro*  *odal  Y-X^  .  Xj+€  on 
plane  spanned  bp  Y  and  Z^-Xj+Xj.  (b)  Projection  of  data 
fro*  aodel  Y^X^  .  Xj+€  on  plane  apanned  by  Y  and  Z^X^-Xj 
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3.  A  PROJECTION  PURSUIT  PARADIGM 

The  discussion  in  the  previous  section  motivates  the  fol¬ 
lowing  scheas  for  s  clsss  of  procedures  aodeling  structure  in 
aultiverlste  date: 

(i)  Choose  sn  Initial  aodel. 

Repest 

(li)  Find  s  projection  that  shows  deviation  of  the 
date  from  the  current  aodel,  indicating  pre¬ 
viously  undetected  structure  (Projection  Pursuit), 
(ill)  Change  the  aodel  to  Incorporate  the  structure 
found  in  (11)  (Model  Update) . 

Until  the  current  aodel  agrees  with  the  data  in  all  pre  ■ 

jectlons. 

Such  projection  pursuit  procedures  can  be  Implemented  in 
batch  mode.  In  this  case,  a  figure  of  merit  must  be  defined, 

which  measures  the  amount  of  deviation  between  model  and  data 

revealed  in  a  projection.  This  figure  of  merit  usually  is 
optimized  by  numerical  search,  although  In  some  simple  cases 
optimization  can  be  done  analytically.  If  the  optimum  figure 
of  merit  is  less  than  a  threshold,  data  and  model  are  said  to 
agree.  Batch  implementat ions  of  projection  pursuit  regression 
and  density  estimation  are  described  in  Sections  5  and  6. 

By  construction,  projection  pursuit  procedures  are  ideally 
suited  for  implementation  on  interactive  computer  graphics 
systems.  Interaction  between  program  and  user  can  help  in 

-  search  for  interesting  projections 

-  specification  of  model  update 

-  termination 


interpretation  of  structure. 
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Although  projection  pursuit  procedures  ere  useful  In  batch 
■ode,  their  full  power  coaei  to  beer  in  en  Interactive  envi¬ 
ronment  . 

4.  THE  PRIM-9  SYSTEM 

PRIM-9  (Flsherkeller  et  el,  [1974])  Is  a  system  for  vis¬ 
ual  inspection  of  up  to  nine-dimensional  data,  mainly  intended 
for  detecting  clusters  and  hypersurfaces.  It  was  Implemented 
on  an  interactive  computer  graphics  system  which  allows  the 
modification  of  pictures  in  real  time  and  thus  makes  it  pos¬ 
sible  to  generate  movie-like  effects.  Its  basic  set  of  oper¬ 
ations  consists  of 

Projection:  The  observations  can  be  projected  on  a  sub¬ 
space  spanned  by  any  pair  of  the  coordinates;  the  pro¬ 
jection  is  shown  on  a  CRT  screen. 

Rotation:  A  subspace  spanned  by  any  two  of  the  coordin¬ 

ates  can  be  rotated.  If  the  projection  subspace  and  the 
rotation  subspace  share  a  common  coordinate,  the  rota¬ 
tional  motion  causes  the  user  to  perceive  a  spatial  pic¬ 
ture  of  the  data  as  projected  on  the  three-dimensional 
subspace  defined  by  the  coordinates  involved.  When  the 
user  terminates  rotation  in  a  particular  plane,  the  old 
coordinates  in  that  plane  are  replaced  by  the  current  (ro¬ 
tated)  coordinates.  This  makes  it  possible  to  look  at 
completely  arbitrary  projections  of  the  data,  not  neces¬ 
sarily  tied  to  the  original  coordinates. 

Masking :  Subregions  of  the  p-dimenslonal  observation 

space  can  be  specified,  and  only  points  inside  the  sub- 
region  are  displayed.  Under  rotation,  points  will  enter 
and  leave  the  masked  region. 


Isolation:  Point*  that  ar*  masked  out  (i.e.,  not  vlaibla) 


can  b*  removed,  thua  aplitting  the  data  Into  two  subsets. 

The  firat  two  operations,  projection  and  rotation,  allow 
the  uaer  to  pcrfora  what  one  night  call  "manual  projection 
purauit".  Isolation,  the  aplitting  of  the  data  set  into  sub- 
aeta,  provides  a  rudimentary  forn  of  structure  reaoval.  When 
clustering  ia  detected,  the  clustera  can  be  separated  and 
each  of  then  examined  individually.  This  process  can  be 
Iterated . 

Although  several  have  been  implemented  (Stuetzle  &  Thoma 
[1978],  Donoho  et  al  [1981]),  systems  like  PRIM-9  have  not 
yet  found  widespread  use.  The  main  reason  has  been  the  price 
of  the  necessary  computing  equipment.  The  processing  power 
needed  to  compute  rotations  at  a  reasonable  update  rate  is 
quite  high  (on  the  order  of  60000  multiplications  per  second 
for  1000  observations  and  10  updates  of  the  picture  per 
second).  Another  major  cost  has  been  the  graphics  device, 
which  must  have  a  sufficiently  high  bandwidth  (typically  a 
megabaud).  The  situation,  however,  is  rapidly  changing.  New 
16-bit  microprocessors  provide  a  speed  close  to  that  required 
for  an  interactive  use  of  projection  pursuit  procedures.  The 
price  of  graphics  systems,  especially  raster  scan  devices.  Is 
falling  dramatically.  The  graphics  system  at  SLAC  used  for 
the  implementation  of  PRIM-9  cost  $175,000  in  1967.  Today 
the  price  of  a  comparable  system  is  $15,000. 

5.  PROJECTION  PURSUIT  REGRESSION  (PPR) 

The  goal  of  regraasion  analysis  is  to  find  and  describe 
the  association  between  a  response  variable  T  and  predictor 
variables  X^...X  ,  using  a  sample  { (y^ » x^)  • 


PPR  attempts 
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to  construct  •  model  for  this  sssocistion  (or,  in  more  classi¬ 
cal  terms,  to  estimate  E(Yjx))  from  the  Information  contained 
in  projections  of  the  data  on  two-dimensional  subspaces  span¬ 
ned  by  Y  and  a  linear  combination  Z  ■  a  .  X.  The  algorithm 
exactly  follows  our  projection  pursuit  paradigm: 

(1)  Choose  an  Initial  model,  for  example  iBq(X)  “  const. 

Repeat 

(11)  Find  a  projection  that  shows  deviation  of  the  data 
from  the  model,  i.e.,  find  a  direction  such  that 
the  current  residuals,  r^y^-mCx^),  show  a  depend¬ 
ence  on  Z-a  .  X 

(Hi)  Describe  this  dependence  by  a  smooth  function  s(Z). 

Update  the  model: 

m(X)  m(X)  +  a  (or  .  X) 

Until  data  and  model  agree  in  all  projections. 

The  model  after  M  iterations  has  the  form 

M 

m(X)  -  mn(X)  +J  star  .  X)  .  (1) 

m-1 

PPR  allows  the  modeling  of  smooth  but  otherwise  completely 
general  regression  surfaces.  So  far,  a  batch  version  has 
been  implemented.  Such  an  implementation  requires  the  speci¬ 
fication  of  a  figure  of  merit  for  projections  and  a  method 
for  summarizing  a  smooth  dependence  ("smoother").  Smoothing 
is  generally  accomplished  by  local  averaging;  the  value  of 
the  smooth  s  at  a  particular  point  z  is  obtained  by  averaging 
the  current  residuals  r^  for  those  observations  with  values 
of  z  ^  close  to  z.  The  size  of  the  neighborhoods  within  which 
averaging  takes  place  is  called  the  bandwidth  of  the  smoother. 

A  smoother  suitable  for  use  with  PPR  and  guidelines  for  choo^ 
lng  the  bandwidth  are  described  and  discussed  in  Friedman  and 
Stuetzle  (1981). 

i 
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A  choice  for  the  figure  of  merit  le  suggested  by  figures 
2a  and  2b.  The  (Inverse)  figure  of  aerlt  Is  taken  to  be  the 
residual  sua  of  squares  around  the  saooth  of  the  current  re¬ 
siduals  versus  a  .  X.  It  Is  saall  in  Figure  2a,  where  the 
smooth  could  closely  follow  the  observations,  and  large  In 
Figure  2b,  where  the  smooth  would  be  roughly  constant.  This 
definition  of  the  figure  of  merit  Implies  that  In  each  Iter¬ 
ation  the  model  Is  updated  along  the  direction  for  which  the 
update  yields  the  biggest  reduction  in  residual  sua  of  squares. 

As  with  any  stepwise  procedure,  one  needs  a  criterion  for 
stopping  the  iteration.  Stopping  too  soon  can  Increase  the 
bias  of  the  estimate,  while  not  stopping  soon  enough  can  un¬ 
duly  increase  its  variance.  "Optimal"  termination  of  step¬ 
wise  procedures  has  been  studies  (see  Stone,  [1981]);  these 
methods  can  be  applied  here.  In  practice,  the  iteration  is 
usually  terminated  subjectively,  based  on  differences  between 
successive  values  of  the  residual  sum  of  squares.  In  addition, 
graphical  Inspection  of  s  («  .  X)  can  be  used  to  judge  whether 

the  corresponding  term  should  be  Included  in  the  model.  If 
the  graph  of  s  shows  a  noisy  pattern  with  no  systematic  ten- 
dency,  then  its  Inclusion  can  only  Increase  the  variability 
of  the  estimate.  On  the  other  hand,  a  definite  dependence  In¬ 
dicates  that  s  deals  with  an  inadequacy  of  the  present  model, 
n 

The  following  example  illustrates  the  operation  of  PPR.  A 
sample  of  200  observations  was  generated  according  to  the 
model 

Y  -  10  sin(nX1X2)  +  20(X3~0.5)2  +  10X4+5X5+0X6+€ 
with  (X3>...,Xg)  uniformly  distributed  in  [-1,1]^  and  €~N(0,1). 
Figure  4a  shows  Y  plotted  against  the  best  single  predictor. 
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X^ ,  and  tha  correapondlng  smooth.  (The  response  Y  ia  plotted 
on  the  vertical  axle,  X^  on  the  horizontal  axle.  The  "+" 
symbols  represent  data  points,  numbers  Indicated  more  than  1 
data  point.  The  smooth  is  represented  by  the  symbols.) 

Figure  4b  shows  Y  plotted  against  the  linear  combination  x 
found  in  the  first  iteration  with  a ^  »  (0.41,  0.51,  -0.04, 
0.69,  0.31,  0.0).  The  association  is  seen  to  be  approximately 
linear.  The  model  after  the  first  iteration  thus  is  a  plane 
which,  in  this  case,  closely  coincides  with  the  least  squares 
plane  through  the  data.  Figure  4c  shows  the  residuals  from 
this  model  plotted  against  the  second  linear  combination  a^.X 
found  by  the  algorithm,  with  «2  “  (“0.14,  0.0,  0.99,  0.04, 

0.0,  -0.03).  This  Iteration  is  seen  to  Incorporate  the  quad¬ 
ratic  dependence  of  the  response  on  Xj  iqto  the  model.  Fig¬ 
ure  4d  shows  the  residuals  after  two  iterations  plotted 
against  the  third  linear  combination  with  a ^  -  (.0.70,  0.72, 
0.01,  0.03,  0.02,  0.00).  Figure  4e  shows  the  residuals  after 
three  iterations  plotted  against  the  fourth  linear  combin¬ 
ation,  with  ■  (0.80,  -0.59,  -0.10,  0.04,  0.01,  0.0). 

The  last  two  iterations  are  seen  to  model  the  Interaction 
term  sin(nX1X2).  A  further  iteration  failed  to  substantial¬ 
ly  Improve  the  model. 

For  a  more  complete  discussion  of  PPR  and  additional  ex¬ 
amples,  see  Friedman  and  Stuetzle  (1981]. 
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6.  PROJECTION  PURSUIT  DENSITY  ESTIMATION  (PPDE) 

The  goal  of  density  estimation  is  to  estimate  the  multi¬ 
variate  distribution  of  a  random  vector  X  on  the  basis  of  an 
i.i.d.  sample  • •xn»  Our  procedure  again  follows  the  pro¬ 
jection  pursuit  paradigm: 

(1)  Choose  an  initial  model  for  the  density,  for  example, 
«q ■  multivariate  normal  with  sample  mean  and  covariance  mat¬ 
rix. 

Repeat 

(11)  Find  a  projection  that  shows  deviation  of  the  data 
from  the  model;  l.e»,  find  a  direction  such  that 
m(^.  X),  the  model  marginal  along  a,  differs  from 
p(a.  X),  the  (estimated)  data  marginal  along  a. 
(ill)  Define  an  "augmenting  function"  *(*•£)  *■  the 
quotient  of  data  and  model  marginals 
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f<«  .  X)  - 

Update  the  aodal  ao  that  It  and  the  data  agree  In 
the  marginal  along 

■  (X)  ♦-  a(X)  •  f  (or  .  X)  . 

Until  data  and  model  agree  In  all  projections. 

The  aodel  after  M  steps  of  the  Iteration  Is  of  the  fora 

M 

a (X)  -  aQ(X)  •  a  (2) 

m  *  1 

In  step  (111)  of  the  algorithm,  the  marginal  of  the  data 
along  or  must  be  estimated  and  the  marginal  of  the  current 
model  must  be  computed.  The  data  marginal  presents  no  prob¬ 
lem.  It  can  be  estimated  by  projecting  the  data  onto  «  and 
using  a  one-dlmenslonal  kernel  or  near  neighbor  estimate. 

The  analytic  computation  of  the  model  marginal  can  be  very 
difficult  because  It  requires  a  (p-l)-dimenslonal  integration. 
We  perform  the  integration  by  Monte  Carlo,  generating  a  saaple 
from  the  model  and  proceeding  as  in  the  estimation  of  the 
data  marginal. 

As  In  the  case  of  PPR,  only  a  batch  version  of  PPDE  has 
so  far  been  implemented.  At  each  iteration,  the  direction  a 
Is  chosen  such  that  the  update  of  the  current  model  yields 
the  largest  improvement  In  goodness-of-f it  as  aeasured  by  the 
likelihood  of  the  saaple.  Termination  rules  are  analogous  to 
those  used  In  PPR. 

The  following  exaaple  illustrates  the  operation  of  PPDE. 
The  data  for  the  exaaple  are  the  concentration  levels  of  four 
horaones  in  blood  aeasureaents  of  256  children.  The  purpose 


p(flL  .  i) 

a  (a  .  XT 


of  applying  PPDE  to  these  data  Is  to  deteralne  If  a  Caussian 
distribution  represents  a  reasonable  approximation  to  the 
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data  density.  Figures  Ss-Sd  cospare  the  experimental  dsts  to 
s  Monte  Carlo  sample  drawn  from  a  Gaussian  density  with  the 
saaple  aean  and  covariance,  as  projected  onto  each  of  the  mea- 
sureaent  coordinates.  The  hiatograa  of  the  experiaental  data 
is  drawn  with  solid  lines;  the  hiatograa  of  the  Monte  Carlo 
data  is  indicated  by  syabols.  Inspection  of  these  projec¬ 
tions  Indicates  that  although  there  are  possibly  soae  discrep¬ 
ancies,  a  Gaussian  density  aight  be  a  reasonable  approxlaatlon 
to  the  data. 

Figures  6a-6e  show  results  for  three  Iterations  of  PPDB. 
The  solution  direction  associated  with  the  first  iteration 
Is  aalnly  a  combination  of  the  second  and  third  coordinate  nee* 
sureaents.  The  data  distribution  (Figure  6a)  is  seen  to  be 
somewhat  skew  and  more  peaked  than  the  corresponding  Causslan. 
The  discrepancy  between  the  data  and  the  Gaussian  model  is 
much  more  pronounced  in  this  projection  than  on  any  of  the 
original  coordinate  measurements.  Figure  6b  plots  the  augment¬ 
ing  function  f^toj  •  X)  . 

The  second  linear  combination  mainly  Involves  the  third 
and  fourth  coordinates.  The  principal  difference  between  the 
current  model  p2(X)  and  the  data  is  seen  to  be  a  substantial 
skewness  to  the  left  (Figure  6c).  Figure  6d  shows  the  corres¬ 
ponding  augmenting  function  fj.  The  linear  combination  associ¬ 
ated  with  the  third  projection  mostly  Involves  the  first  and 
second  coordinates.  Although  this  iteration  is  trying  to 
account  for  an  apparent  additional  skewness  of  the  data  (Figure 
6e),  the  effect  is  seen  to  be  relatively  small  and  perhaps  not 
significant. 


DATA  AND  CURRENT  M0DEL  PR0JLCTI0NS 


'  -2  0 


AUGMENTING  FUNCTI0N 


(b) 


Fig.  6.  Honont  Data:  (a)  Hiatograa  of  lat  solution  lin¬ 
ear  combination  a. • (0 . 02 , 0 . 08 , -0 . 59 , 0 . 14 )  with  Gaussian  modal 
superimposed.  (bT1 Augment ing  function  along  1st  solution  linear 
combination  or,  (0.02,0.80,-0.59,0.14).  (c)  Histogram  of  2nd.  ab¬ 
solution  linear  combination  «2<(-0.Oi,O.lS.-0.45,*0.87)  with 
currant  modal  Monte  Carlo  super  imposed .  (d)  Augmenting  function 
along  second  linear  combination  “s* (-0 . 09 , 0 . 18 , -0 . 45 , -0 . 87) . 

(e)  Histogram  of  third  solution  linear  combination  criofO.AS, 
0.88,  0.16,-0.02)  with  current  model  Monte  Carlo  superimposed. 


DATA  AND  CURRENT  M0DEL  PR0JLCTI0NS 
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Fig.  6e. 

Application  of  PPDE  to  these  data  reveala  that  a  Gaussian 
model  provides  a  considerably  less  adequate  description  than 
indicated  by  the  coordinate  projections  alone.  The  associated 
graphics  gives  some  insight  into  the  nature  of  the  nonnormality 
of  the  data. 

7.  0ISCUSSI0H 

All  projection  pursuit  procedurea  share  some  common  advan¬ 
tage  a  s 

-  Since  all  eatlmatlon  is  carried  out  in  a  univariate  settings 
the  large  bias  of  kernel  or  near  neighbor  eatimates  in  high 
dlmenalons  can  often  be  avoided. 

-  PP  procedures  do  not  require  specification  of  a  metric  in 
the  obaervation  space. 

-  Bias  la  encountered  with  stepwise  procedures  when  many  terms 
are  required  to  provide  a  good  representation  of  the  model 
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underlying  the  data,  but  only  a  few  can  be  estimated  due  to 
Insufficient  sample  alse.  In  these  cases.  It  is  important 
that  the  first  few  terms  be  able  to  approximate  a  wide 
variety  of  functions  ao  that  the  moat  salient  features  of 
the  data  can  be  modeled.  In  the  limit  M  -•  oo  ,  any  regres¬ 
sion  function  can  be  represented  by  (1),  and  any  density 
can  be  represented  by  (2)  (independent  of  the  Initial  modelX 
but  even  for  moderate  N,  functions  of  those  types  constitute 
rich  classes.  In  addition,  the  choice  of  the  Initial  model 
permits  the  user  to  introduce  any  knowledge  (s)he  may  have 
concerning  the  data,  thereby  allowing  a  further  reduction 
in  bias. 

-  As  a  data-analytlc  tool,  projection  pursuit  procedures  pro¬ 
vide  a  set  of  directions  ••!*()  for  exploring  the  differ¬ 
ences  between  the  initial  model  and  the  data.  The  fact  that 
at  each  stage  the  direction  is  chosen,  for  which  the  currot 
model  least  adequately  describes  the  data,  makes  them  good 
candidates  for  that  purpose.  A  graphical  comparison  of  the 
projections  of  model  and  data,  along  with  knowledge  of  the 
initial  model,  can  yield  considerable  Insight  into  the  mul¬ 
tivariate  data  distribution.  Pictorial  representations  of 
each  of  the  augmenting  functions  s  ,  respectively  f  ,  along 
with  the  particular  directions  over  which  they  vary,  can 
also  be  quite  informative  since  it  Is  these  functions  that 
actually  comprise  the  model. 

There  are  situations  In  which  projection  pursuit  proced¬ 
ures  can  be  expected  not  to  perform  well.  Examples  of  regres¬ 
sion  functions  requiring  a  large  number  of  terms  in  equation 
(1)  are  those  with  multiple  peaks.  Examples  of  unfavorable 
density  functions  are  those  with  highly  concave  lsopleths  or 
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with  sperically  nested  iaopleths  of  the  same  de:sity  value. 

In  addition  to  regression  snd  density  estimation,  the  pro¬ 
jection  pursuit  paradigm  can  be  applied  to  the  problems  of 
classification  and  robust  estimation  of  covariance  matrices. 
All  projection  pursuit  procedures  use  the  same  set  of  basic 
operations,  projection  pursuit  and  model  update.  This  should 
allow  the  design  of  an  interactive  system  for  analysis  of  mul¬ 
tivariate  data  that  covers  a  wide  range  of  problems  and  yet  is 
easy  to  learn  and  simple  to  operate. 
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